Modeling text with generalizable Gaussian mixtures

نویسندگان

  • Lars Kai Hansen
  • Sigurður Sigurðsson
  • Thomas Kolenda
  • Finn Årup Nielsen
  • Ulrik Kjems
  • Jan Larsen
چکیده

We apply and discuss generalizable Gaussian mixture (GGM) models for textmining. The model automatically adapts model complexity for a given text representation. We show that the generalizability of these models depends on the dimensionality of the representation and the sample size. We discuss the relation between supervised and unsupervised learning in text data. Finally, we implement a novelty detector based on the density model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling high-level information by using Gaussian mixture correlation for GMM-UBM based speaker recognition

The Gaussian mixture model-universal background model (GMM-UBM) has been dominant in text-independent speaker recognition tasks. However the conventional GMM-UBM method assumes that each Gaussian mixture is independent and ignores the fact that within Gaussian mixtures, there do exist some useful high-level speaker-dependent characteristics, such as word usage or speaking habits. Based on the G...

متن کامل

Modeling Audio and Visual Cues

Audio-visual event detection aims to identify semantically defined events that reveal human activities. Most previous literature focused on restricted highlight events, and depended on highly ad-hoc detectors for these events. This research emphasizes generalizable robust modeling of single-microphone audio cues and/or single-camera visual cues for the detection of real-world events, requiring ...

متن کامل

Erratum to: Local Statistical Modeling via a Cluster-Weighted Approach with Elliptical Distributions

Cluster-weighted modeling (CWM) is a mixture approach to modeling the joint probability of data coming from a heterogeneous population. Under Gaussian assumptions, we investigate statistical properties of CWM from both theoretical and numerical points of view; in particular, we show that Gaussian CWM includes mixtures of distributions and mixtures of regressions as special cases. Further, we in...

متن کامل

Comparison of Clustering Algorithms for Speaker Identification

In this paper we consider the problem of text-independent speaker identification that refers to acoustic recognition research. Many different techniques have been presented over past several decades. A stateof-the-art technique uses Gaussian Mixtures (GMM) for modeling speaker data distribution presented by MFCC [1] or LPCC [2] features. The classification is obtained by choosing the speaker cl...

متن کامل

Infinite Dirichlet Mixtures in Text Modeling

This paper proposes a Dirichlet process mixture modeling approach to Dirichlet Mixtures (DM). Endowing a prior distribution on an infinite number of mixture components, this approach yields an appropriate number of components as well as their parameters at the same time. Experimental results on amino acid distributions and text corpora confirmed this effect and showed comparative performance on...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000